The following provides a description of single-cell RNA-seq analysis of motorneurons collected and sequenced from Crotalus atrox in 2018.
In this analysis, I examined expression at the *gene level* between locomotor and spinal cord cells to determine genes that were differentially expressed, using the tiger rattlesnake as a reference. We see 26 differentially expressed genes, which includes some interesting candidates, but no potassium channels, somewhat consistent in magnitude with an earlier analysis.
A final plot compares genes elected in another analysis for qPCR, showing that the ss-rnaseq data is consistent with those results.
<BORIS NEEDS TO ADD METHODS DETAILS HERE>
Boris delivered 2 plates with 54 locomotor samples and 78 shaker samples to Alexander Janjic in May 2018 see (Boris_June2018.pdf). Elsewhere, the laboratory notes say that there were 54 locomotor cells and 62 shaker cells. In the notes, it says During centrifugation, 3 locomotor and 16 shaker samples were lost due to an accident. These numbers don’t add up.
Most plausibly, it looks like there should be 51 locomotor cells and 62 shaker cells in in the experiment.
Library was prepared using the mcSCRB-seq protocol v 1.1 (JWB Bagnoli et al. 2018).
i7 index N714 was used for locomotor cell plate, i7 index N715 was used for shaker cell plate.
Barcode sets per cell were unknown, requiring us to digitally demultiplex the cells…
| Index | Name | Nextera | Phenotype |
|---|---|---|---|
| GCTCATGA | Boris1 | N714 | Locomotor |
| ATCTCAGG | Boris2 | N715 | Shaker |
Other laboratory specific notes are in the file Boris_June2018.pdf set from Alexander Janjic in June 2018.
The samples were then pooled and run on 0.5 lanes of XX instrument. (NEED MORE DETAILS FROM ALEXSANDAR, if he remembers…!)
Raw single cell data was processed using the zUMIs pipeline (Parekh et al. 2018), which filters reads with low-quality barcodes and UMIs, and then uses the STAR aligner v2.7.3a (Dobin et al. 2012) to align reads to the Croatuls tigris genome (NCBI RefSeq GCF_016545835.1) (Margres et al. 2021). zUMIs then predicts cell barcodes and collapses UMIs to create a read count table for downstream analysis.
Read count tables were then analyzed using the (Hao et al. 2021) package to determine expression differences between cell pools.
All code for data analysis can be found at (https://github.com/msuefishlab/INSERT_REPO_NAME)
rps<-read.table(file.path(root,"output_data/single_snakes/zUMIs_output/stats/single_snakes.readspercell.txt"),header = T)
tc<-data.frame(rps)
tc$type<-factor(tc$type)
tc$plate<-as.factor(substr(tc$RG,7,14))
tc$cell<-as.factor(substr(tc$RG,0,6))
levels(tc$plate)[1] <- "Bad"
levels(tc$plate)[2] <- "ATCTCAGG"
levels(tc$plate)<-c("Bad","Shaker","Locomotor")
featColors<-c("#1A5084", "#914614" ,"#118730","grey33","tan1","#631879FF","gold1","grey73","firebrick3")
names(featColors)<-c("Exon","Intron+Exon","Intron","Unmapped","Ambiguity","MultiMapping","Intergenic","Unused BC","User")
tc %>%
filter(plate != "Bad") %>%
filter(type != "Unmapped") %>%
ggplot(aes(y=N,fill=plate))+
geom_density(alpha=0.7) +
xlab("") +
ylab("log10(Number of reads)") +
scale_y_log10()+
coord_flip()+
scale_fill_manual(values=mycolors)+
geom_hline(yintercept = 1e04)
Based on filtering out cells with < 10,000 reads, we get a number that looks similar to the number of input cells, with 10 more locomotor cells then we are supposed to have… let’s dig into this a bit more.
tc %>% filter(plate != "Bad" & N > 1e+04 & type != "Unmapped") %>%group_by(cell,plate) %>% summarise(Total=sum(N)) %>% group_by(plate) %>% summarise(n=n()) %>% kbl() %>% kable_styling()
| plate | n |
|---|---|
| Shaker | 63 |
| Locomotor | 60 |
zumis_output <- readRDS(file.path(root,"output_data/single_snakes/zUMIs_output/expression/single_snakes.dgecounts.rds"))
umis <- as.matrix(zumis_output$umicount$inex$all)
seu <- Seurat::CreateSeuratObject(counts = umis)
seu@meta.data$celltype<-as.factor(substr(row.names(seu@meta.data),7,14))
levels(seu@meta.data$celltype)<-c("shaker","shaker","locomotor")
Right out of the gate, zUMIs reduces the 192 possible barcodes from our file to to 162 potential cells (equally distributed between the two phenotypes). We know that there were less cells than this, suggesting that some of these cells represent “noise”…
seu@meta.data %>% group_by(celltype) %>% summarise(n=n()) %>% kbl() %>%
kable_styling()
| celltype | n |
|---|---|
| shaker | 81 |
| locomotor | 82 |
We can see right away that most cells are ~ 20% ribosomal RNA, but locomotor cells have a long tail, suggesting some “bad” cells with low ribosomal RNA in them…
#grep("^RP[LS]",rownames(seu@assays$RNA@counts),value = TRUE)
PercentageFeatureSet(seu,pattern="^RP[LS]|LOC120298527|LOC120298533") -> seu$percent.Ribosomal
seu@meta.data %>%
ggplot(aes(x=percent.Ribosomal, color = celltype, fill=celltype)) +
geom_density(alpha = 0.7) +
theme_classic()+
scale_fill_manual(values=mycolors)+
geom_vline(xintercept=12)
apply(
seu@assays$RNA@counts,
2,
max
) -> seu$largest_count
apply(
seu@assays$RNA@counts,
2,
which.max
) -> seu$largest_index
rownames(seu)[seu$largest_index] -> seu$largest_gene
100 * seu$largest_count / seu$nCount_RNA -> seu$percent.Largest.Gene
Again, we can see a pretty sharp bimodal distribution with a large number of cells < 10,000 reads and another large group > 10,000 reads…
seu@meta.data %>%
ggplot(aes(x=nCount_RNA, color = celltype, fill=celltype)) +
geom_density(alpha = 0.7) +
theme_classic()+
scale_x_log10()+
scale_fill_manual(values=mycolors)+
geom_vline(xintercept=1.4e04)
Let’s look at the number of unique genes found in each sample. Here we can see that shaker cells have a strange distribution: one population has about 1000 unique features which is not present in locomotor cells at the same frequency. In the majority of cells we detect approximately 7500 unique features (genes)…
seu@meta.data %>%
ggplot(aes(x=nFeature_RNA, color = celltype, fill=celltype)) +
geom_density(alpha = 0.7) +
theme_classic()+
scale_x_log10()+
scale_fill_manual(values=mycolors)+
geom_vline(xintercept=3000)
Based on these values, we have a reasonable set of filters to propose:
filtered_seu <- subset(
seu,
nCount_RNA>1.4e04 &
nFeature_RNA > 3000 &
percent.Ribosomal > 12)
filtered_seu@meta.data %>% group_by(celltype) %>% summarise(n=n()) %>% kbl() %>%
kable_styling()
| celltype | n |
|---|---|
| shaker | 45 |
| locomotor | 45 |
This gives us a total of 90 cells to examine, let’s take a look at the distributions of cells a bit more closely (turning off log scales now…)
filtered_seu@meta.data %>%
ggplot(aes(x=nCount_RNA, color = celltype, fill=celltype)) +
geom_density(alpha = 0.7) +
theme_classic()+
scale_fill_manual(values=mycolors)+
geom_vline(xintercept=1.4e04)+
geom_vline(xintercept=1.25e05)
What I see here is a bit of “ringing” in both samples with very high read counts, which I think are referred to as “doublets” in ss-RNAseq land. Let’s get rid of those now too!
filtered_seu <- subset(
seu,
nCount_RNA>1.4e04 &
nFeature_RNA > 3000 &
percent.Ribosomal > 12 &
nCount_RNA<1.25e05 )
filtered_seu@meta.data %>% group_by(celltype) %>% summarise(n=n()) %>% kbl() %>%
kable_styling()
| celltype | n |
|---|---|
| shaker | 41 |
| locomotor | 34 |
This gives us a grand total of 75 cells. Let’s have a look at the distributions of our QC data in this filtered dataset.
Idents(filtered_seu) <- 'celltype'
qc_plots<-VlnPlot(filtered_seu,features=c("nFeature_RNA", "nCount_RNA","percent.Ribosomal"),alpha(0.5),combine=FALSE)
for(i in 1:length(qc_plots)) {
qc_plots[[i]] <- qc_plots[[i]] + scale_fill_manual(values=mycolors)+ theme(legend.position = 'none')
}
CombinePlots(qc_plots)
The distributions of our QC data seem to mostly overlap and make good biological sense. Let’s proceeed!
What are the most variable genes in our dataset, potentially prediciting what cell type cluster they belong to?
filtered_seu <- NormalizeData(filtered_seu, normalization.method = "RC", scale.factor = 1e6)
filtered_seu <- FindVariableFeatures(filtered_seu, selection.method = "vst", nfeatures = 2000)
top10 <- head(VariableFeatures(filtered_seu), 10)
plot1 <- VariableFeaturePlot(filtered_seu)
plot2 <- LabelPoints(plot = plot1, points = top10, repel = TRUE)
plot2
ScaleData(filtered_seu,features=rownames(filtered_seu)) -> filtered_seu
RunPCA(filtered_seu,features=VariableFeatures(filtered_seu)) -> filtered_seu
DimPlot(filtered_seu,reduction="pca",group.by="celltype") + scale_color_manual(values=mycolors)
I’m not seeing a super clean separation between shaker and locomotor cells, but there is sort of a trend here…
Idents(filtered_seu) <- 'celltype'
markers<-FindMarkers(filtered_seu,ident.1 = "shaker", min.pct = 0.25)
sig_genes<-subset(markers, p_val_adj<0.1)
sig_genes$id<-rownames(sig_genes)
sig_genes<-as.data.frame(sig_genes)
sig_list<-merge(sig_genes,subset(gtf,type=="transcript") %>% group_by(gene) %>% filter(!duplicated(gene)) %>% select(gene,product),by.x="id",by.y="gene",all.x=TRUE)
sig_list[,c(1,7,2,3,4,5,6)] %>% arrange(p_val_adj) %>% kbl() %>% kable_styling()
| id | product | p_val | avg_log2FC | pct.1 | pct.2 | p_val_adj |
|---|---|---|---|---|---|---|
| LOC120298205 | uncharacterized LOC120298205 | 0.0e+00 | -6.587978 | 0.024 | 1.000 | 0.0000000 |
| PDE7A | phosphodiesterase 7A, transcript variant X1 | 0.0e+00 | -4.613105 | 0.317 | 1.000 | 0.0000000 |
| LPAR6 | lysophosphatidic acid receptor 6 | 0.0e+00 | -3.783138 | 0.073 | 0.912 | 0.0000001 |
| LOC120317132 | galectin-6-like | 0.0e+00 | -4.390212 | 0.073 | 0.824 | 0.0000010 |
| LOC120318694 | uncharacterized LOC120318694, transcript variant X1 | 0.0e+00 | 3.398786 | 0.927 | 0.441 | 0.0000022 |
| LOC120319595 | vomeronasal type-2 receptor 26-like | 0.0e+00 | -4.527950 | 0.098 | 0.794 | 0.0000048 |
| LOC120303167 | splicing factor SWAP, transcript variant X1 | 0.0e+00 | 2.144733 | 1.000 | 0.853 | 0.0000089 |
| EPB41L3 | erythrocyte membrane protein band 4.1 like 3, transcript variant X5 | 0.0e+00 | -1.003199 | 1.000 | 1.000 | 0.0001171 |
| TUBB | tubulin beta class I | 0.0e+00 | -1.395067 | 0.927 | 1.000 | 0.0001508 |
| AK5 | adenylate kinase 5, transcript variant X1 | 0.0e+00 | -2.764053 | 0.390 | 0.912 | 0.0001856 |
| ABCC9 | ATP binding cassette subfamily C member 9, transcript variant X2 | 0.0e+00 | 1.454360 | 1.000 | 0.941 | 0.0002655 |
| LOC120319596 | vomeronasal type-2 receptor 26-like | 1.0e-07 | -1.379046 | 0.878 | 1.000 | 0.0015383 |
| IKZF1 | IKAROS family zinc finger 1, transcript variant X2 | 2.0e-07 | -2.272177 | 0.366 | 0.824 | 0.0043207 |
| PTPRM | protein tyrosine phosphatase receptor type M, transcript variant X1 | 2.0e-07 | -1.218622 | 1.000 | 1.000 | 0.0049298 |
| LOC120307612 | tubulin beta-2 chain | 3.0e-07 | -1.238360 | 0.976 | 1.000 | 0.0069229 |
| SDC2 | syndecan 2 | 4.0e-07 | -2.192644 | 0.488 | 1.000 | 0.0085474 |
| SEZ6 | seizure related 6 homolog, transcript variant X3 | 6.0e-07 | 1.452948 | 1.000 | 0.824 | 0.0120074 |
| NALCN | sodium leak channel, non-selective, transcript variant X3 | 1.3e-06 | -2.108295 | 0.634 | 0.912 | 0.0263756 |
| HOXD9 | homeobox D9 | 1.3e-06 | 2.138052 | 0.902 | 0.412 | 0.0265779 |
| LZTR1 | leucine zipper like transcription regulator 1, transcript variant X2 | 2.5e-06 | -1.564748 | 0.561 | 0.941 | 0.0509420 |
| CREB5 | cAMP responsive element binding protein 5, transcript variant X7 | 2.6e-06 | -1.722398 | 0.707 | 0.971 | 0.0538470 |
| TRNAA-AGC-9 | NA | 2.9e-06 | -4.533654 | 0.000 | 0.441 | 0.0603690 |
| TPPP3 | tubulin polymerization promoting protein family member 3, transcript variant X1 | 3.0e-06 | -1.260155 | 0.854 | 1.000 | 0.0630456 |
| LOC120301339 | 5S ribosomal RNA | 3.8e-06 | -3.370541 | 0.244 | 0.765 | 0.0792824 |
| TUBB2A | tubulin beta 2A class IIa | 4.0e-06 | -1.463318 | 0.756 | 0.941 | 0.0825205 |
| CNTROB | centrobin, centriole duplication and spindle assembly protein, transcript variant X4 | 4.0e-06 | 2.071961 | 0.829 | 0.294 | 0.0839091 |
I’ve been able to identify 26 DE genes between shaker and locomotor cells. Of particular interest might be the following: NALCN (sodium leak channel) and splicing factor SWAP. Here are expression plots for all 26 genes.
myfeatures<-sig_list$id
plots <- VlnPlot(filtered_seu,group.by="celltype", split.by = "celltype",features=myfeatures,combine=FALSE)
for(i in 1:length(plots)) {
plots[[i]] <- plots[[i]] + geom_boxplot(alpha=0.5) + theme(legend.position = 'none') + scale_fill_manual(values=mycolors)
}
CombinePlots(plots)
Here are the genes examined in our qPCR assay:
goi<-c("HOXD9","KCNQ3","KCNQ2","LOC120319564")
plots <- VlnPlot(filtered_seu,group.by="celltype", split.by = "celltype",features=goi,combine=FALSE)
for(i in 1:length(plots)) {
plots[[i]] <- plots[[i]] + geom_boxplot(alpha=0.2) + theme(legend.position = 'none') + scale_fill_manual(values=mycolors)
}
CombinePlots(plots)
Finally, CHAT appears to be a good marker for motorneurons, it’s highly expressed and seems to be uniformly abundant in both samples…
goi_motorneuron_markers<-c("CHAT")
plots <- VlnPlot(filtered_seu,group.by="celltype", split.by = "celltype",features=goi_motorneuron_markers,combine=FALSE)
for(i in 1:length(plots)) {
plots[[i]] <- plots[[i]] + geom_boxplot(alpha=0.2) + theme(legend.position = 'none') + scale_fill_manual(values=mycolors)
}
CombinePlots(plots)
Here are the additional genes requested by Max:
goi<-c("ABCC8","KCNJ8","KCNJ11")
plots <- VlnPlot(filtered_seu,group.by="celltype", split.by = "celltype",features=goi,combine=FALSE)
for(i in 1:length(plots)) {
plots[[i]] <- plots[[i]] + geom_boxplot(alpha=0.2) + theme(legend.position = 'none') + scale_fill_manual(values=mycolors)
}
CombinePlots(plots)
Here are the additional genes requested by Boris:
goi<-c("NALCN")
plots <- VlnPlot(filtered_seu,group.by="celltype", split.by = "celltype",features=goi,combine=FALSE)
for(i in 1:length(plots)) {
plots[[i]] <- plots[[i]] + geom_boxplot(alpha=0.2) + theme(legend.position = 'none') + scale_fill_manual(values=mycolors)
}
CombinePlots(plots)
Michigan State University Department of Integrative Biology, jgallant@msu.edu↩︎